Read Assembly
Volume Number: 9
Issue Number: 5
Column Tag: Assembly Workshop
The Secrets of the Machine 
Or, how to read Assembler
By Malcolm H. Teas, Rye, New Hampshire
About the author
Malcolm H. Teas, 556 Long John Road, Rye, NH 03870 Internet:
mhteas@well.sf.ca.us
AppleLink: mhteas@well.sf.ca.us@INTERNET#
America Online: mhteas
Why Read Assembler?
When we write programs in C or Pascal, what we’re really doing is writing in
the computer’s second language. When I studied French I was always translating it into
english in my head to understand it. Well, that’s just what the computer’s doing. It’s
taking C, Pascal, or whatever you’re programming with and translating it to its native
language - assembler. Translation is what a compiler’s job is. But just like my
french translations would lead to errors and awkward speech, the compiler can
occasionally make mistakes and the code that the compiler creates from your source is
a little awkward too - it isn’t always the most efficient. Sometimes this doesn’t matter
that much since the CPU is quite fast. However, if you’re doing a time critical
algorithm, the speed of your application just isn’t what you want it to be, or you
suspect that there’s some strange error, then it’s time to talk to the machine in its
native language. This article is a traveller’s phrasebook.
Where do I find assembler?
Although this wasn’t always so, these days you can now find some way to examine
either the translated assembler code for your source or your disassembled application.
If you’re using the latest version of Think C (version 5.0), then try the
“Disassemble” item in the “Source” menu. This generates the translated assembler
version of the source in the front window. The new window that results shows the
assembler and can be printed, saved and otherwise treated as any other Think C editor
window.
MPW (Macintosh Programmer’s Workshop) also offers a number of tools to get
at your assembler listings. The dumpCode tool takes any type of code resource and
disassembles it. It also can list the jump table and other information that is included.
I’ll talk about jump tables when I cover the memory map of an application. If you use
the SourceBug debugger, it has an option to view source as either the original language
or as assembler.
ResEdit now has an external that, when you open a code resource, disassembles it.
It is quite helpful in finding the targets of jumps and other memory addresses, it shows
you graphically with arrows. Unfortunately, it doesn’t permit editing of the
assembler. While this external is not officially supported by Apple, it has worked
well for me.
If you cannot get any of these, you can always use a low-level debugger like
MacsBug or TMON. By getting into the debugger while in your application, you can
disassemble the code you’re interested in and save it to a file. In MacsBug, you’d use
the “ip” command to disassembler around the program counter, then use the “log”
command to save the screen to a file.
What the computer really looks like.
When you read assembler, you see instructions that refer to registers, memory
locations, and have an unusual syntax. The programming environment for assembler
is the bare machine so it has some constraints. To learn to read assembler, you need to
know something about the environment. Actually, this is the hard part, reading the
assembler instructions is easy.
First are the registers. The CPUs (a computer’s central processing unit) that are
used these days have a number of registers to hold data or addresses currently being
used by the program. The Motorola chips used in the Macintosh (the 680x0 family)
have eight data registers, eight address registers, a program counter (also called a
PC), and a condition code or status register. The 68020 and later chips have some
other specialized registers used by the Mac’s operating system for handling
interrupts, mapping memory, and managing the CPU’s cache. However, these are only
used in the operating system and are not interesting to the application programmer.
Data registers are used more often by instructions that manipulate data like the
logical and arithmetic instructions. Address registers are used to address locations in
the computer’s memory. They’re often used to index data and can be used in “move”
instructions to help calculate the memory location of data. Address register seven
(A7) is used by the CPU as a stack pointer. Some instructions can address data on the
stack and automatically push or pop the stack. The Mac operating system has a
convention to use address register five (A5) as the pointer to the top of an
application’s global data and to use A6 (address register six) as the stack frame
pointer. I’ll cover more of the stack frame and global data space later.
The program counter (PC) is a special register that holds the address of the next
instruction to execute. The status register (SR or CCR for Condition Code Register)
holds flags showing the results of the last data operation: zero, negative, positive, etc.
These are used in all branching instructions that implement the “if” statements,
loops, and multi-way ifs like the C “switch” statement.
The memory of the Mac, to an assembler language programmer, just looks like a
big array. Some of this array holds the program, some holds the system, some holds
the application’s data, and some other is used by other applications. This explains why
one application’s bugs can cause problems for other applications. The first
application can overwrite the contents of memory anywhere so that data or code for
another application can be damaged too. As a result, keeping track of pointers and
handles is quite important.
But for an application, the Mac’s memory is organized into application memory
areas which hold the heap, stack, and global data. Any application is expected to stay in
its own area. Low memory belongs to the interrupt table and system globals. Above
that is the system heap, followed by the multifinder area. In high memory are the
address locations of the cards and I/O devices that the Mac is equipped with. The Mac
operating system divides the MultiFinder area into application memory areas or
partitions, one for each application in memory at the time. The size of an application’s
partition is determined when an application is launched from the ‘SIZE’ resource. If
there is no ‘SIZE’ resource, a default partition size of 512K bytes is used. The
partition size can be changed by the user in the Finder’s “Get Info” box. This creates
a new ‘SIZE’ resource. (See Inside Mac VI page 5-14 for more information on the
‘SIZE’ resource.)
This application memory partition is, in turn, subdivided into the application’s
heap, stack, and global area. Your application’s code, opened resources, handle and
pointer blocks are all in the heap which occupies the bottom part of the partition and
may grow upward. The jump table, global variables, and the quickdraw application
globals are all stored in the global area at the top of the partition. Register A5 (by
convention) points into this area, at the top of the application’s globals. When a
routine references global data, it’s done as a negative offset from register A5. Due to
how this addressing mode is coded in instructions, this makes the maximum size of the
application globals 32K. Although some compilers have ways around this limit, it’s
best to stay under it the larger global areas are more difficult to access and make your
program less efficient. Parameters for routines and local variables are stored on the
stack which grows downward in memory and is located just beneath the QuickDraw
application globals.
The jump table is fixed in memory for the life of the application and so is used to
get around the 32K limit on ‘CODE’ resources and to allow them to be moved in
memory. When a routine is called that isn’t in the same code resource as the calling
routine, the compiler & linker make a jump table entry. This is a jump instruction to
the other code resource. So, the calling routine does a JSR (Jump to Subroutine) to
the address in the jump table, and the jump table the jumps control to the location in
the new code resource. When a code resource is moved in memory, the jump table is
corrected. This also allows the Segment Loader manager (part of the Mac Toolbox) to